Classification of DNA sequences using Bloom filters
Identifieur interne : 002663 ( Main/Exploration ); précédent : 002662; suivant : 002664Classification of DNA sequences using Bloom filters
Auteurs : Henrik Stranneheim [Suède] ; Max K Ller [Suède] ; Tobias Allander [Suède] ; Björn Andersson [Suède] ; Lars Arvestad [Suède] ; Joakim Lundeberg [Suède]Source :
- Bioinformatics [ 1367-4803 ] ; 2010.
Abstract
Motivation: New generation sequencing technologies producing increasingly complex datasets demand new efficient and specialized sequence analysis algorithms. Often, it is only the ‘novel’ sequences in a complex dataset that are of interest and the superfluous sequences need to be removed. Results: A novel algorithm, fast and accurate classification of sequences (FACSs), is introduced that can accurately and rapidly classify sequences as belonging or not belonging to a reference sequence. FACS was first optimized and validated using a synthetic metagenome dataset. An experimental metagenome dataset was then used to show that FACS achieves comparable accuracy as BLAT and SSAHA2 but is at least 21 times faster in classifying sequences. Availability: Source code for FACS, Bloom filters and MetaSim dataset used is available at http://facs.biotech.kth.se. The Bloom::Faster 1.6 Perl module can be downloaded from CPAN at http://search.cpan.org/∼palvaro/Bloom-Faster-1.6/ Contacts: henrik.stranneheim@biotech.kth.se; joakiml@biotech.kth.se Supplementary information: Supplementary data are available at Bioinformatics online.
Url:
DOI: 10.1093/bioinformatics/btq230
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream Istex, to step Corpus: 000109
- to stream Istex, to step Curation: 000109
- to stream Istex, to step Checkpoint: 000534
- to stream Main, to step Merge: 002688
- to stream Main, to step Curation: 002663
Le document en format XML
<record><TEI wicri:istexFullTextTei="biblStruct"><teiHeader><fileDesc><titleStmt><title>Classification of DNA sequences using Bloom filters</title>
<author><name sortKey="Stranneheim, Henrik" sort="Stranneheim, Henrik" uniqKey="Stranneheim H" first="Henrik" last="Stranneheim">Henrik Stranneheim</name>
</author>
<author><name sortKey="K Ller, Max" sort="K Ller, Max" uniqKey="K Ller M" first="Max" last="K Ller">Max K Ller</name>
</author>
<author><name sortKey="Allander, Tobias" sort="Allander, Tobias" uniqKey="Allander T" first="Tobias" last="Allander">Tobias Allander</name>
</author>
<author><name sortKey="Andersson, Bjorn" sort="Andersson, Bjorn" uniqKey="Andersson B" first="Björn" last="Andersson">Björn Andersson</name>
</author>
<author><name sortKey="Arvestad, Lars" sort="Arvestad, Lars" uniqKey="Arvestad L" first="Lars" last="Arvestad">Lars Arvestad</name>
</author>
<author><name sortKey="Lundeberg, Joakim" sort="Lundeberg, Joakim" uniqKey="Lundeberg J" first="Joakim" last="Lundeberg">Joakim Lundeberg</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:9F1C2D341F92D49DE7C480EB2823B669D874C7BC</idno>
<date when="2010" year="2010">2010</date>
<idno type="doi">10.1093/bioinformatics/btq230</idno>
<idno type="url">https://api.istex.fr/ark:/67375/HXZ-XV8ZN2M3-9/fulltext.pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000109</idno>
<idno type="wicri:explorRef" wicri:stream="Istex" wicri:step="Corpus" wicri:corpus="ISTEX">000109</idno>
<idno type="wicri:Area/Istex/Curation">000109</idno>
<idno type="wicri:Area/Istex/Checkpoint">000534</idno>
<idno type="wicri:explorRef" wicri:stream="Istex" wicri:step="Checkpoint">000534</idno>
<idno type="wicri:doubleKey">1367-4803:2010:Stranneheim H:classification:of:dna</idno>
<idno type="wicri:Area/Main/Merge">002688</idno>
<idno type="wicri:Area/Main/Curation">002663</idno>
<idno type="wicri:Area/Main/Exploration">002663</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a" type="main">Classification of DNA sequences using Bloom filters</title>
<author><name sortKey="Stranneheim, Henrik" sort="Stranneheim, Henrik" uniqKey="Stranneheim H" first="Henrik" last="Stranneheim">Henrik Stranneheim</name>
<affiliation wicri:level="1"><country xml:lang="fr">Suède</country>
<wicri:regionArea>Science for Life Laboratory, KTH Royal Institute of Technology, SE-100 44 Stockholm, LingVitae AB, Roslagstullsbacken 33, 114 21 Stockholm, Department of Microbiology, Laboratory for Clinical Microbiology, Tumor and Cell Biology, Karolinska University Hospital, Karolinska Institutet, SE-17176 Stockholm, Department of Cell and Molecular Biology, Karolinska Institutet, SE-17177 Stockholm and School of Computer Science and Communication, Stockholm Bioinformatics Center, AlbaNova University Center, Royal Institute of Technology, 106 91 Stockholm</wicri:regionArea>
<wicri:noRegion>106 91 Stockholm</wicri:noRegion>
</affiliation>
<affiliation></affiliation>
</author>
<author><name sortKey="K Ller, Max" sort="K Ller, Max" uniqKey="K Ller M" first="Max" last="K Ller">Max K Ller</name>
<affiliation wicri:level="1"><country xml:lang="fr">Suède</country>
<wicri:regionArea>Science for Life Laboratory, KTH Royal Institute of Technology, SE-100 44 Stockholm, LingVitae AB, Roslagstullsbacken 33, 114 21 Stockholm, Department of Microbiology, Laboratory for Clinical Microbiology, Tumor and Cell Biology, Karolinska University Hospital, Karolinska Institutet, SE-17176 Stockholm, Department of Cell and Molecular Biology, Karolinska Institutet, SE-17177 Stockholm and School of Computer Science and Communication, Stockholm Bioinformatics Center, AlbaNova University Center, Royal Institute of Technology, 106 91 Stockholm</wicri:regionArea>
<wicri:noRegion>106 91 Stockholm</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Allander, Tobias" sort="Allander, Tobias" uniqKey="Allander T" first="Tobias" last="Allander">Tobias Allander</name>
<affiliation wicri:level="1"><country xml:lang="fr">Suède</country>
<wicri:regionArea>Science for Life Laboratory, KTH Royal Institute of Technology, SE-100 44 Stockholm, LingVitae AB, Roslagstullsbacken 33, 114 21 Stockholm, Department of Microbiology, Laboratory for Clinical Microbiology, Tumor and Cell Biology, Karolinska University Hospital, Karolinska Institutet, SE-17176 Stockholm, Department of Cell and Molecular Biology, Karolinska Institutet, SE-17177 Stockholm and School of Computer Science and Communication, Stockholm Bioinformatics Center, AlbaNova University Center, Royal Institute of Technology, 106 91 Stockholm</wicri:regionArea>
<wicri:noRegion>106 91 Stockholm</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Andersson, Bjorn" sort="Andersson, Bjorn" uniqKey="Andersson B" first="Björn" last="Andersson">Björn Andersson</name>
<affiliation wicri:level="1"><country xml:lang="fr">Suède</country>
<wicri:regionArea>Science for Life Laboratory, KTH Royal Institute of Technology, SE-100 44 Stockholm, LingVitae AB, Roslagstullsbacken 33, 114 21 Stockholm, Department of Microbiology, Laboratory for Clinical Microbiology, Tumor and Cell Biology, Karolinska University Hospital, Karolinska Institutet, SE-17176 Stockholm, Department of Cell and Molecular Biology, Karolinska Institutet, SE-17177 Stockholm and School of Computer Science and Communication, Stockholm Bioinformatics Center, AlbaNova University Center, Royal Institute of Technology, 106 91 Stockholm</wicri:regionArea>
<wicri:noRegion>106 91 Stockholm</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Arvestad, Lars" sort="Arvestad, Lars" uniqKey="Arvestad L" first="Lars" last="Arvestad">Lars Arvestad</name>
<affiliation wicri:level="1"><country xml:lang="fr">Suède</country>
<wicri:regionArea>Science for Life Laboratory, KTH Royal Institute of Technology, SE-100 44 Stockholm, LingVitae AB, Roslagstullsbacken 33, 114 21 Stockholm, Department of Microbiology, Laboratory for Clinical Microbiology, Tumor and Cell Biology, Karolinska University Hospital, Karolinska Institutet, SE-17176 Stockholm, Department of Cell and Molecular Biology, Karolinska Institutet, SE-17177 Stockholm and School of Computer Science and Communication, Stockholm Bioinformatics Center, AlbaNova University Center, Royal Institute of Technology, 106 91 Stockholm</wicri:regionArea>
<wicri:noRegion>106 91 Stockholm</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Lundeberg, Joakim" sort="Lundeberg, Joakim" uniqKey="Lundeberg J" first="Joakim" last="Lundeberg">Joakim Lundeberg</name>
<affiliation wicri:level="1"><country xml:lang="fr">Suède</country>
<wicri:regionArea>Science for Life Laboratory, KTH Royal Institute of Technology, SE-100 44 Stockholm, LingVitae AB, Roslagstullsbacken 33, 114 21 Stockholm, Department of Microbiology, Laboratory for Clinical Microbiology, Tumor and Cell Biology, Karolinska University Hospital, Karolinska Institutet, SE-17176 Stockholm, Department of Cell and Molecular Biology, Karolinska Institutet, SE-17177 Stockholm and School of Computer Science and Communication, Stockholm Bioinformatics Center, AlbaNova University Center, Royal Institute of Technology, 106 91 Stockholm</wicri:regionArea>
<wicri:noRegion>106 91 Stockholm</wicri:noRegion>
</affiliation>
<affiliation></affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="j" type="main">Bioinformatics</title>
<idno type="ISSN">1367-4803</idno>
<idno type="eISSN">1460-2059</idno>
<imprint><publisher>Oxford University Press</publisher>
<date type="published">2010</date>
<date type="e-published">2010</date>
<biblScope unit="vol">26</biblScope>
<biblScope unit="issue">13</biblScope>
<biblScope unit="page" from="1595">1595</biblScope>
<biblScope unit="page" to="1600">1600</biblScope>
</imprint>
<idno type="ISSN">1367-4803</idno>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">1367-4803</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass></textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract">Motivation: New generation sequencing technologies producing increasingly complex datasets demand new efficient and specialized sequence analysis algorithms. Often, it is only the ‘novel’ sequences in a complex dataset that are of interest and the superfluous sequences need to be removed. Results: A novel algorithm, fast and accurate classification of sequences (FACSs), is introduced that can accurately and rapidly classify sequences as belonging or not belonging to a reference sequence. FACS was first optimized and validated using a synthetic metagenome dataset. An experimental metagenome dataset was then used to show that FACS achieves comparable accuracy as BLAT and SSAHA2 but is at least 21 times faster in classifying sequences. Availability: Source code for FACS, Bloom filters and MetaSim dataset used is available at http://facs.biotech.kth.se. The Bloom::Faster 1.6 Perl module can be downloaded from CPAN at http://search.cpan.org/∼palvaro/Bloom-Faster-1.6/ Contacts: henrik.stranneheim@biotech.kth.se; joakiml@biotech.kth.se Supplementary information: Supplementary data are available at Bioinformatics online.</div>
</front>
</TEI>
<affiliations><list><country><li>Suède</li>
</country>
</list>
<tree><country name="Suède"><noRegion><name sortKey="Stranneheim, Henrik" sort="Stranneheim, Henrik" uniqKey="Stranneheim H" first="Henrik" last="Stranneheim">Henrik Stranneheim</name>
</noRegion>
<name sortKey="Allander, Tobias" sort="Allander, Tobias" uniqKey="Allander T" first="Tobias" last="Allander">Tobias Allander</name>
<name sortKey="Andersson, Bjorn" sort="Andersson, Bjorn" uniqKey="Andersson B" first="Björn" last="Andersson">Björn Andersson</name>
<name sortKey="Arvestad, Lars" sort="Arvestad, Lars" uniqKey="Arvestad L" first="Lars" last="Arvestad">Lars Arvestad</name>
<name sortKey="K Ller, Max" sort="K Ller, Max" uniqKey="K Ller M" first="Max" last="K Ller">Max K Ller</name>
<name sortKey="Lundeberg, Joakim" sort="Lundeberg, Joakim" uniqKey="Lundeberg J" first="Joakim" last="Lundeberg">Joakim Lundeberg</name>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 002663 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 002663 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Sante |area= MersV1 |flux= Main |étape= Exploration |type= RBID |clé= ISTEX:9F1C2D341F92D49DE7C480EB2823B669D874C7BC |texte= Classification of DNA sequences using Bloom filters }}
This area was generated with Dilib version V0.6.33. |